Word boundary detection with mel-scale frequency bank in noisy environment

نویسندگان

  • Gin-Der Wu
  • Chin-Teng Lin
چکیده

This paper addresses the problem of automatic word boundary detection in the presence of noise. We first propose an adaptive time-frequency (ATF) parameter for extracting both the time and frequency features of noisy speech signals. The ATF parameter extends the TF parameter proposed by Junqua et al. from single band to multiband spectrum analysis, where the frequency bands help to make the distinction of speech and noise signals clear. The ATF parameter can extract useful frequency information by adaptively choosing proper bands of the mel-scale frequency bank. The ATF parameter increased the recognition rate by about 3% of a TF-based robust algorithm which has been shown to outperform several commonly used algorithms for word boundary detection in the presence of noise. The ATF parameter also reduced the recognition error rate due to endpoint detection to about 20%. Based on the ATF parameter, we further propose a new word boundary detection algorithm by using a neural fuzzy network (called SONFIN) for identifying islands of word signals in noisy environment. Due to the self-learning ability of SONFIN, the proposed algorithm avoids the need of empirically determining thresholds and ambiguous rules in normal word boundary detection algorithms. As compared to normal neural networks, the SONFIN can always find itself an economic network size in high learning speed. Our results also showed that the SONFIN’s performance is not significantly affected by the size of training set. The ATF-based SONFIN achieved higher recognition rate than the TF-based robust algorithm by about 5%. It also reduced the recognition error rate due to endpoint detection to about 10%, compared to an average of approximately 30% obtained with the TF-based robust algorithm, and 50% obtained with the modified version of the Lamel et al. algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A robust word boundary detection algorithm for variable noise-level environment in cars

This paper discusses the problem of automatic word boundary detection in the presence of variable-level background noise in cars. Commonly used robust word boundary detection algorithms always assume that the background noise level is fixed and sets fixed thresholds to find the boundary of word signal. In fact, the background noise level in cars varies in the procedure of recording due to speed...

متن کامل

PCA-Based Speech Enhancement for Distorted Speech Recognition

We investigated a robust speech feature extraction method using kernel PCA (Principal Component Analysis) for distorted speech recognition. Kernel PCA has been suggested for various image processing tasks requiring an image model, such as denoising, where a noise-free image is constructed from a noisy input image [1]. Much research for robust speech feature extraction has been done, but it rema...

متن کامل

A framework for robust MFCC feature extraction using SNR-dependent compression of enhanced mel filter bank energies

The Mel-frequency cepstral coefficients (MFCC) are most widely used and successful features for speech recognition. But, their performance degrades in presence of additive noise. In this paper, we propose a noise compensation method for Mel filter bank energies and so MFCC features. This compensation method includes two steps: Mel sub-band spectral subtraction and then compression of Mel-Sub-ba...

متن کامل

Robust Speech Detection with Heteroscedastic Discriminant Analysis Applied to the Time-frequency Energy

In this paper, we propose a robust speech detection algorithm with Heteroscedastic Discriminant Analysis (HDA) applied to the Time-Frequency Energy (TFE). The TFE consists of the log energy in time domain, the log energy in the fixed band 2503500 Hz, and the log Mel-scale frequency bands energy. The bottom-up algorithm with automatic threshold adjustment is used for accurate word boundary detec...

متن کامل

Mel sub-band filtering and compression for robust speech recognition

The Mel-frequency cepstral coefficients (MFCC) are commonly used in speech recognition systems. But, they are high sensitive to presence of external noise. In this paper, we propose a noise compensation method for Mel filter bank energies and so MFCC features. This compensation method is performed in two stages: Mel sub-band filtering and then compression of Mel-sub-band energies. In the compre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Speech and Audio Processing

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2000